Skyline Identification in Multi-Armed Bandits
نویسندگان
چکیده
We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of n arms A[1], . . . , A[n], each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the skyline of the set A, consisting of all arms A[i] such that A[i] has larger expected reward than all lower-numbered arms A[1], . . . , A[i− 1]. We define a natural notion of an ε-approximate skyline and prove matching upper and lower bounds for identifying an ε-skyline. Specifically, we show that in order to identify an ε-skyline from among n arms with probability 1− δ,
منابع مشابه
Modal Bandits
Analyses of multi-armed bandits primarily presume that the value of an arm is its expected reward. We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions.
متن کاملMultiple Identifications in Multi-Armed Bandits
We study the problem of identifying the top m arms in a multi-armed bandit game. Our proposed solution relies on a new algorithm based on successive rejects of the seemingly bad arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle other multiple identifications settings that were previously out of reach. In particular we show that this idea of successive...
متن کاملHabilitation À Diriger Des Recherches De L'université Paris-est Agrégation Pac-bayésienne Et Bandits À Plusieurs Bras Pac-bayesian Aggregation and Multi-armed Bandits
متن کامل
Generic Exploration and K-armed Voting Bandits
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of ...
متن کاملThe Epoch-Greedy Algorithm for Contextual Multi-armed Bandits
We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales asO(T S) or better (sometimes, much better). Here S is th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.04213 شماره
صفحات -
تاریخ انتشار 2017